User-Assisted Archive Document Image Analysis for Digital Library Construction

نویسندگان

  • Jingyu He
  • Andy C. Downton
چکیده

A configurable archive document image analysis system for digital library construction has been designed using rapid prototyping and top-down iterative development methods. This approach has been found to be essential in order to capture the curators’ expertise about existing card archive structures, content and databases. The design currently achieves about 93% correct segmentation of the required archive card fields overall, with 81.3% of all archive cards in a testset of 2000 images having all fields correctly segmented and labelled. Analysis of errors in the testset indicates that heavily-annotated cards and non-standard card formats comprise 5-10% of the overall archive, and a significant proportion of these are unlikely to be resolvable without curatorial intervention.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A web2.0 collaborative cultural heritage archive with recommender system over trace based reasoning

Cultural heritage presents a big quantity of information; they entice different kinds of persons. In last decades, computer technology and internet helped bringing history to present life. Ancient and historical documents were digitized and exposed online. Therefore, cultural heritage digital libraries and web sites were created, first to enhance document preservation, and second to facilitate ...

متن کامل

Document Icons and Page Thumbnails: Issues in Construction of Document Thumbnails for Page-Image Digital Libraries

Digital libraries are increasingly based on digital page images, but techniques for constructing usable versions of these page images are largely folklore. This paper documents some issues encountered in creating various kinds of renderings of page images for the UpLib digital library system, and suggests approaches for each, based on both problem analysis and user feedback. Several factors imp...

متن کامل

A method of content-based image retrieval for a spinal x-ray image database

The Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine (NLM). maintains a digital archive of 17,000 cervical and lumbar spine images collected in the second National Health and Nutrition Examination Survey (NHANES II) conducted by the National Center for Health Statistics (NCHS). Classification of the images for the...

متن کامل

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

Ensuring Retrieval Effectiveness in Distributed Digital Libraries

• collection management; • organizing and indexing the materials for storage We find that dissemination of collection-wide information (CWI) in a distributed collection of documents is needed to and retrieval; achieve retrieval effectiveness comparable to that of a central• user interfaces and human-computer interaction; and ized collection. Complete dissemination is unnecessary. The • interope...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003